Measuring and Modeling Behavioral Decision Dynamics in Collective Evacuation 
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Identifying and quantifying factors influencing human decision making remains an outstanding chal- 
lenge, impacting the performance and predictabihty of social and technological systems. In many 
cases, system failures are traced to human factors including congestion, overload, miscommunication, 
and delays. Here we report results of a behavioral network science experiment, targeting decision 
making in a natural disaster. In each scenario, individuals are faced with a forced "go" versus "no 
go" evacuation decision, based on information available on competing broadcast and peer-to-peer 
sources. In this controlled setting, all actions and observations are recorded prior to the decision, 
enabling development of a quantitative decision making model that accounts for the disaster like- 
lihood, severity, and temporal urgency, as well as competition between networked individuals for 
limited emergency resources. Individual differences in behavior within this social setting are corre- 
lated with individual differences in inherent risk attitudes, as measured by standard psychological 
assessments. Identification of robust methods for quantifying human decisions in the face of risk 
has implications for policy in disasters and other threat scenarios. 



INTRODUCTION 

The development of new communication technologies 
enables rapid information dissemination and decision 
making among groups of individuals, but it also creates 
new challenges in the coordination of collective behav- 
ior. For example, the adoption of social networking tech- 
nologies such as Twitter and Facebook can empower the 
masses but makes them hard to control [T]-[5]. More gen- 
erally, the advent of contemporary network technologies 
has brought with it a new set of fragilities stemming from 
the complexity of human behavior: people rarely behave 
optimally, randomly, or uniformly, as often naively as- 
sumed in technological design and policy development. 

Within the field of network science, the study of so- 
cial networks plays an increasingly important role in 
method development and associated applications, with 
widespread implications in marketing, politics, educa- 
tion, epidemics, and disasters. Considerable effort is di- 
rected towards understanding how information diffuses 
through social groups [9HT4]. with particular emphasis 
on the role of news websites [13], blogs [T5], Facebook 
[T7j . Twitter (TS], and other social media [TOll^ . 

As information diffuses, individuals can display a range 
of decision making behaviors driven by new information. 
Phenomena of particular interest include (1) the dynam- 
ics of cascading behavior, which can explain how and 
why fads emerge [21] or rumors spread so quickly [22ll23j . 
and (2) the role that individuals play as "spreaders" in 
facilitating the propagation of this behavior pH - BSj . or 
similarly the roll that "homophily" can play in abrogat- 
ing uptake of a behavior |27| . Social epidemics, much 



like their biological counterparts [2gH3T| . are often mod- 
eled as single- [32] or multi-stage [33] complex contagion 
processes [5iH5B] . 

Recent theoretical investigations have examined how 
this information exchange leads to collective action. In 
one class of models, individual agents occupy nodes on a 
network, and a set of rules defines information propaga- 
tion dynamics and individual decision making behavior 
(e.g., see [231IM1I37]). Complementary data driven inves- 
tigations describe computational algorithms that begin 
to unravel rules for influence and decision making from 
large databases, such as Twitter, Facebook, and wireless 
communication networks (e.g., [6l [26l |38l [39] ) . In most 
cases the databases identify decisions that are made and 
delineate links between network members. However, in- 
formation about the factors that drive human decisions, 
including individual observations, attention, history, per- 
sonality, and risk perception is generally unavailable. 

This paper focuses on a critical link between simu- 
lation studies and empirical observations of large scale 
networks. Specifically, we conducted a behavioral exper- 
iment involving a group of 50 individuals in a computer 
laboratory. Because human behavior is often far from 
what is predicted by idealized models, experimental ob- 
servation in "live" and controlled environments are essen- 
tial for improved understanding and modeling of social 
phenomena. Our work adapts the framework of Kearns 
et al. [40ll43j . who have conducted a series of "behav- 
ioral network science" (BNS) experiments that have fo- 
cused on collective problem solving tasks, such as ab- 
stract graph coloring problems or economic investment 
games. These experiments have demonstrated that "hu- 



man subjects perforin remarkably well at the collective 
level" in a number of tasks and scenarios, both compet- 
itive and cooperative [43] . However, disasters and other 
crisis situations often display the opposite effect [HHIH]. 
Social interactions can lead to a "mob mentality" [^51 - 
[51] that hinders evacuation and may lead to injury and 
violence. Moreover, associated spatiotemporal cluster- 
ing of departure times can lead to traffic congestion and 
delays [SMSl]. 

Therefore, in contrast to previous BNS research in- 
volving idealized, abstract games, our investigations in- 
volve decision making in a threat scenario. Specifically, 
our study is set in the context of an impending natu- 
ral disaster, where each individual occupies a node in a 
social network and must decide whether or not to evacu- 
ate. The experiment is conducted for a sequence of time- 
evolving disaster scenarios. In each scenario, individuals 
receive real time updates from a centralized information 
source about the likelihood, severity, and timing of a dis- 
aster that threatens their virtual community. Individuals 
also receive social information regarding decisions of their 
neighbors, and availability of space in a virtual shelter. 
Thus, participants face a tradeoff in competing types of 
information (i.e., centralized broadcast information ver- 
sus decentralized social information) in a laboratory set- 
ting that emphasizes risk and loss. 

Compared to large data driven studies, the experiment 
provides a much more complete, quantitative set of mea- 
surements, enabling us to assess factors and isolate ten- 
sions that arise in human decision making. In addition 
to observing the ultimate evacuation decisions, our ex- 
perimental setup allows us to monitor the behavior of 
individuals as they gather information. Prior to the ex- 
periment, we also assess individual personality profiles 
and risk attitudes using standardized tests. The ability 
to acquire this extensive set of static and dynamic mea- 
surements both prior to and during the decision making 
process allows us to quantify links between psychological 
assessments and heterogeneity in group behavior. 

A primary outcome of this study is the identification of 
a decision model for evacuation behavior based on empir- 
ical observations. The model output fits the observations 
remarkably well and can be used to quantify individual 
differences in decision dynamics. The empirical model re- 
duces the catalog of scenarios and observations to a few 
key parameters involving an overall multiplicative rate 
factor for evacuation, an average decision threshold based 
on the disaster likelihood, and variability about the av- 
erage threshold, reflecting how consistently the decision 
making threshold was applied. The model enables us to 
isolate and compare two sources of urgency in the exper- 
iment that differentially impact observed behavior: time 
pressure for the evacuation decision and competition for 
shelter space. This empirical model stands in contrast to 
a set of models typically used in numerical simulations 
or large scale, data driven studies that treat decisions as 



random, optimal, or based on a threshold applied to a 
state variable representing opinion, which is updated by 
an assumed interaction rule (e.g., [2T ] [37 ] l49 ] [50 ] [54 ] [55 ] ) . 
While our experiment is admittedly well removed from 
a true natural disaster, it allows us to isolate and quan- 
tify tensions that arise in a crisis, in a manner that would 
not be possible during an actual event. Furthermore, the 
experimental design takes into account known psycho- 
logical factors associated with risk perception, threat, 
and information processing (56H59J . A key component 
of behavioral network science is to use the observed hu- 
man behavior as inspiration for the development of novel 
computational models of behavior, which can in turn be 
tested in future experiments. This spiral development of 
model- experiment-model or experiment-model- experim,ent 
may be used to develop optimal strategies for disseminat- 
ing information during a disaster, and insuring sufficient 
allocation of resources for disaster response. 



EXPERIMENT 

On May 18, 2012 an experiment was conducted at 
the University of California, Santa Barbara (UCSB) in 
which 50 student participants within a virtual commu- 
nity each decided if and when to evacuate from impend- 
ing natural disasters. All participants provided written 
informed consent, and the experimental protocol was ap- 
proved by the Institutional Review Board of UCSB. Prior 
to taking part in the study, the personality profile of each 
participant was measured using the Big Five Inventory 
(BFI-44) questionnaire [50^62] . and the risk preferences 
of each participant were also measured in six domains (so- 
cial, investment, gambling, health & safety, ethical, and 
recreational) using a Domain Specific Risk Attitude Scale 
[63l [64] . The Big Five Inventory is a commonly used set 
of 44 questions that enables the assessment of an indi- 
vidual's personality along the following dimensions: ex- 
traversion, neuroticism, openness, conscientiousness, and 
agreeableness. The Big Five is used extensively in psy- 
chological research as well as in translational applications 
such as the assessment of learning styles and employee 
placement. The Domain Specific Risk Attitude Scale is 
used in psychological research to assess risk perception 
and risk behavior, to predict human behavior, and to 
develop policy in areas such as health and natural haz- 
ards. Administration of each questionnaire lasted ap- 
proximately 7 minutes. 

Individuals participated in 47 scenarios (runs) that 
lasted one minute each. At the beginning of each sce- 
nario, each participant was given 100 monetary "points" 
that were at risk from a simulated disaster. During 
each scenario, participants were provided with informa- 
tion about the progression of the disaster, and they were 
offered the opportunity to evacuate from this disaster (a 
binding decision) and occupy one of a limited number of 



spaces in a virtual disaster shelter. Depending on their 
decision and the outcome of the disaster, they could lose 
some or all of their monetary points. The magnitude of 
the loss was a function of whether or not the individual 
successfully evacuated to the shelter, and whether or not 
the disaster struck. The total amount paid to a partici- 
pant at the end of the experiment was a function of their 
cumulative score over the 47 runs. The running cumu- 
lative scores of all of the participants were ranked and 
displayed on a leader board at the front of the room. 



Experiment Layout 

The primary objective of this project was to under- 
stand the way in which individual decision makers use 
and share information, and how this information leads 
to collective action of the group as a whole. Of par- 
ticular interest was obtaining insight into the influence 
of competing sources of information on individual and 
group behavior. 

To reach these objectives, we employ an experimen- 
tal setup derived from that of Kearns et al. [JDHiH] . We 
customize the computational framework and user inter- 
face to our evacuation problem. Each participant sits 
in front of a computer screen, see Figure [T]A., containing 
two tabbed windows, labeled "Disaster Information" and 
"Social Information." The participant may only view one 
window at a time and can switch between these the two 
sources of information by clicking on the tabs. 

The Disaster Tab, shown in Figure [ip, provides par- 
ticipants with information about the simulated time- 
evolving disaster. At the top of this tab is a disaster 
progress bar, which incrementally turns blue as time goes 
by; a red box around the scenario progress bar signi- 
fies the time window in which the disaster could strike. 
The likelihood that the evolving disaster will strike the 
community is presented in real time as the proportion of 
filled circles (e.g., 4 out of 10 filled circles indicates a cur- 
rent probability of 40%). A loss matrix shows how many 
points an individual will lose at the end of the current 
scenario depending on the outcome of the disaster and 
the individual's final location. Finally, a button at the 
bottom of the Disaster Tab allows participants to evac- 
uate. When an individual clicks the button, they transi- 
tion from being "AtHome" to being "InTransit." If there 
is still space available in the shelter, the individual imme- 
diately transitions to being "InShelter." However, if the 
shelter is already full, the participant remains InTransit 
through the rest of the current scenario. 

The Social Tab, shown in Figure [l]C, allows the par- 
ticipant to query the status of neighbors in their social 
network by clicking on each neighbor's node. If the neigh- 
bor is still AtHome, then the letter 'H' appears on the 
neighbor node. If the neighbor is InTransit, then the let- 
ter 'T' appears. If the neighbor is in the shelter, then the 
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FIG. 1. A: Experimental setup at UCSB. B: Disaster Tab, 
showing current status and loss table. C; Social Tab, showing 
status of neighbors; in this example, neighbors have claimed 
shelter spaces 2, 5, and 18, meaning that at least 18 of 35 
shelter spaces have already been filled. 



shelter space (or "bed" ) number that the neighbor occu- 
pies in the shelter appears. This value provides a lower 
bound on the number of beds occupied in the shelter and 
is also recorded in a shelter diagram toward the bottom 
of the Social Tab. The evacuation button located on the 
Disaster Tab is mirrored on the Social Tab to enable par- 
ticipants to make their evacuation decision irrespective of 



their current tab location. 



Psychometrics of Participants 

Personality metrics. The Big Five Inventory mea- 
sures an individual's personality based on five charac- 
teristics: extraversion, agreeableness, conscientiousness, 
neuroticism, and openness |60n62j . As shown in Fig. [2| 
the group of individuals that volunteered to take part in 
our experiment displayed similar personality profiles to 
the typical values for a similar age group |65j . with the 
exception of neuroticism which was significantly lower 
than in the general population. 
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FIG. 2. Mean and standard deviation (STD) for the Big Five 
Inventory scores calculated over all 50 participants (yellow). 
For comparison, we report the typical values estimated from 
6076 individuals aged 21 (blue) [65] • The only significant 
deviation from typical scores was neuroticism, which had a 
significantly lower mean value. 

Risk Attitude: The risk attitude questionnaire scores 
both general risk attitude and specific risk types in the 
following domains: investment, health & safety, gam- 
bling, social, ethical, and recreational. The evacuation 
scenarios in this experiment were developed predicated 
on the assumption that individuals would be averse to 
the loss of monetary points (financial risk), and loss of 
life and property (health & safety risk). Participant re- 
sponses to questions on the Domain Specific Risk Atti- 
tude Scale test ranged from "1" (Risk Averse) to "5" 
(Risk Seeking) with "3" indicating a risk neutral atti- 
tude. The general risk attitude distribution was risk 
averse (2.60 ± 0.69). When segregated into the sepa- 
rate domains, the population displayed a range of risk 
attitudes summarized in Table HI 



Scenario Simulation Mechanics 

Our experimental setup had several key features de- 
signed to enable the isolation of external drivers and the 
identification of tradeoffs in decision mechanics. These 
features included a network structure linking participants 
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TABLE I. Risk Attitude Scores in 6 Domains: mean and stan- 
dard deviation (STD) calculated over all 50 participants. 



and constraining information diffusion, time-evolving dis- 
aster trajectories, and scenario-to-scenario variation in 
shelter capacity, time pressure, and potential risk to mon- 
etary "points" . We describe these features in greater de- 
tail below. 

Network Structure. In our experiment, a network 
structure enables participants to observe the actions of 
others. In each scenario, participants are assigned at ran- 
dom to a node in an underlying social network topology 
designed by the researchers. This allows an individual 
to have a different number of neighbors in each scenario, 
and for the number of neighbors to vary by individual 
in a single scenario. There were 8 networks used in the 
experiment: 3 "regular" ring lattice graphs, where each 
node was connected to nodes within a distance 1, 2, or 
3, resulting in fixed node degree (i = 2, 4, or 6, respec- 
tively; and 5 "variable" graphs where nodes had degree 
d £ [1, 10] with an average d = 4. More specifically, the 
latter networks were generated as random graphs with 
specified degree sequence {l(xlO), 2(x8), 3(x7), 4(x6), 
5(x5), 6(x4), 7(x4), 8(x3), 9(x2), lO(xl)}, according 
to the algorithm specified in [6 6) and implemented in the 
NetworkX Python library [57] . 

Disaster Trajectories. The disaster strike probability 
as a function of time i, denoted by Phit(Oj was gener- 
ated in advance from a well-defined stochastic process 
(details of its construction can be found in [53]). The 
process corresponds to a two-dimensional progression of 
a threat that moves toward a notional "target" with ran- 
dom lateral motion in one dimension and monotonic for- 
ward progression in the other dimension. The lateral 
motion is simulated with a range of step sizes limited by 
a prescribed volatility, while the forward motion may ei- 
ther have variation or step deterministically. We record 
a "Hit" (corresponding to a disaster strike) if the threat 
contacts a target, or a "Miss" if the forward motion 
causes the threat to pass the target without hitting. Par- 
ticipants can observe a truncated value of Phit (*) on the 
Disaster Tab which is updated every second, however the 
overall trajectory is not shown. There were a total of 23 
^hit(i) trajectories used in the experiment, with many of 
the trajectories repeated with different settings for other 
experimental variables. 



Shelter Capacity. Scenarios varied in shelter capac- 
ity. There were 5 different shelter capacity scenarios: 
50, 40, 30, 20, and 10 beds. When the number of beds 
in the scenario was less than 50 (the number of partic- 
ipants), individuals had to compete for access to these 
beds and could access information on the availability of 
shelter space through their social network. 

Time Pressure. Scenarios varied in time pressure for 
an evacuation decision. When forward motion in the 
disaster trajectory model was deterministic, the disas- 
ter would either Hit or Miss at exactly 60 seconds. This 
type of time pressure is denoted "CertainTime" . For runs 
with variable time steps in the disaster trajectory model, 
the disaster could hit at any point between 30 and 60 sec- 
onds, with an end time that is not known in advance to 
the participants. We refer to this type of time pressure as 
"VariableTime" . The distinction between these types of 
scenarios could be observed by participants through the 
red box around the scenario progress bar on the Disaster 
Tab. These different scenarios were designed to test how 
temporal uncertainty affected evacuation strategies. 

Potential Loss. Scenarios varied in potential risk to 
monetary "points" . At the start of a scenario, each par- 
ticipant is staked 100 points. The amount lost due to 
the disaster depends on the loss matrix, the outcome of 
the scenario, and by the individual's location at the end 
of the run (AtHome, InShelter, or InTransit). Three loss 
matrices were used in the experiment and were based 
on underlying incentive structures designed by the re- 
searchers, with the values changing between runs acting 
to simulate varying disaster severity. The six entries in 
the loss matrix (seen on the Disaster Tab) correspond to 
the combination of the three end-state possibilities and 
the two disaster outcome possibilities. All loss matri- 
ces had a point loss for an (AtHome, Miss) outcome, 
with increasing loss for (InTransit, Miss) and (InShel- 
ter, Miss). When the disaster hit, loss is minimized for 
the combination (InShelter, Hit), followed by (InTran- 
sit, Hit), and the most costly outcome is (AtHome, Hit). 
Values in the loss matrix were deliberately chosen to pre- 
vent trivial solutions, such as always evacuate or always 
stay home, from being winning strategies. 

To summarize our setup and participant behavior, we 
plot the cumulative behavior for two evacuation scenarios 
in Figure [3] The overall behavior in each scenario can be 
observed by the interaction of the Phitit) trajectory (in 
blue), the cumulative number of evacuations (grey fill), 
the number of available shelter spaces (dashed line), and 
the end time of the scenario. The scenario in Figure [3]A. 
is CertainTime while the scenario in Figure [Sfe is Vari- 
ableTime. In both scenarios, there are 40 shelter spaces 
(beds) available for the 50 participants. In Figure [SJA., 
we observe evidence of a stampede in which participants 
evacuated for limited shelter space toward the end of the 
scenario; some participants were left stranded in the state 
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FIG. 3. The collective evacuation behavior in two differ- 
ent scenarios. A (CertainTime): Participants wait until the 
end of the run to evacuate, waiting for more accurate infor- 
mation on the likelihood that the disaster will strike; some 
get stranded InTransit when the number of evacuees exceeds 
the shelter capacity. B (VariableTime): More than haff the 
participants evacuate at approximately the 30 second mark, 
which is the first time that the scenario could end. 



InTransit. In Figure [SB, we observe that a large number 
of participants evacuated at approximately the 30 second 
point in the scenario (the first time the run might end), 
but that the disaster did not happen. 



OBSERVATIONS 

The data collected during the experiment include ev- 
ery mouse click, for all 50 participants in each of the 47 
disaster scenarios. From the data we can identify what 
each individual was seeing, when they were seeing it, 
and if and when they evacuated. This section describes 
empirical observations and statistical analysis based on 
these results, which is used to develop a quantitative de- 
cision model in the next section. Key variables include 
the strike probability (Phit) trajectory (Fig. [3] blue) , the 
loss matrix, the number of beds in the shelter (Fig. [s] 
dashed-black) , and time pressure for the evacuation de- 
cision. 

Participant rankings and scores. The success of each 
participant in each scenario is depicted in Figure |4]A. 
We quantify a participant's success using the total point 
score retained at the conclusion of the 47 runs. The three 
types of successful decisions [(InShelter, Hit); (InTransit, 
Hit); (AtHome, Miss)] are shown in white, while unsuc- 
cessful decisions are shown in black. In the "hardest" 
scenario (located towards the left-most side of the panel 
in Figure BlA), there were zero successes in the popu- 
lation, while in the "easiest" scenarios (located towards 
the right-most side of the panel) a single participant was 
unsuccessful in each run. 

The distribution of cumulative scores is skewed: the 
lowest scoring participant is far below the rest (see Fig- 
ure Hp). We analyze the differences in decision making 
patterns for different individuals in more detail in a later 



section entitled Individual Variation. 




difficult 



run (orderd by difficulty) 



easy 



B 



14 



0) 



3 



■ J.. I.-U JUJJJ. L. 



2000 



2200 2400 

cumulative score 



2600 



FIG. 4. Success and distribution of cumulative scores. A 
shows successful decisions in white [(InShelter, Hit); (InTran- 
sit, Hit); (AtHonie,Miss)] and unsuccessful decisions in black. 
The participants are ordered by cumulative score, with the 
highest scoring at the top. The runs are reordered with the 
most difficult run on the left. B presents a histogram of the 
cumulative scores (grey), with bars showing the exact scores 
in blue. The blue bars highlight the divergence of the most 
unsuccessful participant. 



Participant score correlates with risk attitude. We hy- 
pothesized that risk attitude in both the financial domain 
and the health & safety domain would be a significant 
factor in overall performance. We estimated an individ- 
ual's general financial risk attitude by averaging their 
scores from both gambling and investment risk domains 
[63| , l64 j . and we estimated their overall performance using 
the cumulative score. Cumulative score was significantly 
correlated with health & safety risk attitude (Pearson 
correlation: r — —0.31, p — 0.02) but not with financial 
risk attitude (r — —0.04, p = 0.73). These results indi- 
cate that individuals that were more averse to health & 
safety risks (and therefore potentially more susceptible 
to the specific influences associated with an evacuation 
decision scenario) performed better than those that were 



less averse. 

An interesting question is whether the observed cor- 
relation between risk attitude and performance was con- 
sistently observed over the population or whether it was 
driven by a subset of individuals. From a psychologi- 
cal perspective, one meaningful segregation of individuals 
into groups is a partition based on the consistency of indi- 
vidual risk preferences across domains. Individuals with 
consistent risk preferences across domains often display 
different personality traits — which could directly lead 
to differences in behavior — than those with inconsistent 
risk preferences across domains [51] . To estimate the con- 
sistency of risk attitudes we computed the standard devi- 
ation (7 of mean scores across the 6 risk domains. We sep- 
arated participants into a "consistent" group, composed 
of those individuals with a < 1 (N=31), and an "incon- 
sistent" group, composed of those individuals with it > 1 
(N=19). The observed correlation between performance 
and health & safety risk attitude appears to be driven by 
individuals with inconsistent risk attitudes (r — —0.50, 
p = 0.02) rather than by individual with consistent risk 
attitudes (r = —0.18, p = 0.32). This suggests that in- 
dividuals with domain specific risk attitudes might tune 
their behavior more closely to the risk structure of the 
experiment. 

Participants focus on Disaster Tab. Our results in- 
dicate that participants viewed the Disaster Tab more 
than the Social Tab. Individuals spent the vast majority 
of their overall scenario time on the Disaster Tab, and 
they made 99% of evacuation decisions while on this tab 
(see Fig. [5IA.). Although on average participants did not 
tend to spend as much time on the Social Tab, there 
was significant variation. We did not observe a signif- 
icant relationship between time spent on each tab and 
performance. 

Clicking behavior links to Social Tab. Click frequen- 
cies for all participants in all scenarios are shown in Fig- 
ure [SB, which lists participants by their overall perfor- 
mance (highest first). We can see from this figure that 
the higher click frequency individuals spent less time on 
the Disaster Tab and therefore more time on the Social 
Tab. The majority of participants displayed low values 
of clicking activity, indicating that they accessed social 
network information infrequently. We did not observe a 
significant relationship between click frequency and per- 
formance. 

Network structure drives time spent on Social Tab. 
The total number of neighbors a participant could have 
in any single scenario ranged between one and ten. Fig. [6] 
shows that participants with many neighbors tended to 
spend more time on the Social Tab than those with few 
neighbors. This result is intuitively consistent with the 
fact that highly connected individuals could gain more 
social information than less connected individuals, and 
might therefore be predisposed to spend more time on 
the Social Tab to obtain this information. 
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FIG. 5. Participants spent the majority of their time on the Disaster Tab (Frame A), but we can see those who spent more 
time on the Social Tab also had higher click frequency (Frame B) likely the result of trying to gain information on remaining 
shelter space. 
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Fig. ITt) we present the cumulative number of evacua- 
tions, a running sum of the the data in Fig.lTlC. Here we 
observe a relationship between the total number of evac- 
uations and score: highest scoring participants (top) are 
more likely to have a higher number of total evacuations 
than lower scoring participants (bottom). We confirmed 
this observation by calculating the Pearson correlation 
between score and total number of evacuations: r = 0.39 
with p — 0.005. A notable exception to this trend is 
the fourth lowest scoring participant who also has the 
highest number of evacuations. Interestingly, this partic- 
ipant tended to evacuate much earlier than the other par- 
ticipants, resulting in many erroneous evacuations and 
therefore a lower cumulative score. 



FIG. 6. Relationship Between Number of Neighbors and Time 
Spent on Social Tab. The more network connections a partic- 
ipant had, the more time they spent on the social tab, with a 
Pearson correlation r = 0.8690, p = 0.0011. 



Evacuation decision tied to disaster likelihood. Disas- 
ter likelihood values strongly influenced decision making, 
as shown in Fig. [TJA.. Here we see each observed evacu- 
ation grouped by Phit value at the time of evacuation. 
The distribution has a sharp peak at P^it = 0.7. The cu- 
mulative distribution is shown in Figure [Tb (black) and 
indicates that across all scenarios, about 90% of evacua- 
tions occurred before Phit exceeded 80%. 

High scoring individuals evacuate frequently. We ob- 
served a significant correlation between score and num- 
ber of evacuations at Phit = 0.7 (Pearson correlation: 
r = 0.59, p = 5.8 X 10~^). The lowest scoring individuals 
(see Fig.lTlC, bottom) evacuate earlier and have a greater 
variation in the Phit values at which they evacuate. In 



EMPIRICAL DECISION MODEL 

Following the experimental observations described 
above, our objective is to identify a model for evacua- 
tion decision making that can be used to quantitatively 
capture the main features of population level behavior 
(this section) and the heterogeneity of individual behav- 
ior (next section). Our strategy uses data from the be- 
havioral experiment to determine a decision model that 
depends on a few key state variables in the experiment 
(e.g., the probability of the disaster event Phit)- Based 
on summary statistics of evacuation behavior, we iden- 
tify the functional form of the model and quantitatively 
estimate parameters. We then evaluate the accuracy of 
the model for predicting evacuations using state variables 
and detailed time trajectories from each individual run 
of the experiment. Our approach enables a concrete val- 
idation of our model, and provides direction for future 
experiments and large scale simulations of population be- 
havior in similar scenarios. 
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FIG. 7. The distributions of evacuations as a function of Phit- Frame A shows the numbers of evacuations at each of the eleven 
values of Phit- The distribution is peaked at Phit = 0.7. Frame B presents the normalized cumulative evacuation curves with 
individuals shown in blue and the population as a whole (the running sum of the distribution in A) in black. This provides a 
summary of the heterogeneity in evacuation decisions. Frame C shows the evacuations for each individual participant. Here 
we illustrate results for the highest scoring participant at the top and the lowest scoring participant at the bottom. We see a 
trend that the higher scoring participants evacuated more consistently at Phit ~ 0.7, and the lowest scoring individuals have 
greater spread in the Phit values at which they evacuated. Frame D gives the cumulative evacuations, a running sum of the 
data presented in C. We see that higher scoring individuals evacuate more readily, with the noted exception of the fourth 
worst scoring participant, who tended to evacuate much earlier than the others; a strategy that resulted in many unsuccessful 
evacuations. 



Determining the dynamics of decision making strate- 
gies from the distribution of evacuations (Fig. ItIA.) is a 
complex problem that can be confounded by various fac- 
tors including the distribution of Phit values observed by 
a participant and individual differences in reaction time. 
To account for these factors we introduce a rate model re- 
lating the number of participants evacuated to the num- 
ber of participants AtHome, and determine how state 
variables such as Phit affect the rate. 

As Phit changes every second in our scenarios, it is nat- 
ural for us to examine the data in one second intervals, 
within which Phit is constant. We then define two in- 
dicator functions that enable us to quantify the number 
of participants evacuated and the number of participants 

(r) 

AtHome. First, we define the indicator variable /ij ,- = 1 
if participant / was AtHome at the start of the interval i 
during run r, and h^^ — otherwise (i.e., the participant 
had already evacuated). Second, we define the indicator 



variable 



Ji 



(r) 



1 if participant / evacuated during inter- 



val i on run r, and 



Ji 



(r) 



otherwise. These quantities 



are related by the equation: 

Ar) _ At) 
JLi — 'H'i. 



h^''^ 

/,l+l- 



(1) 



We approximate an individual's decision to evacuate 
as a Bernoulli process in the following way. First we note 



that when h 



(r) 
lA 



1, we can model the probability of evac- 



uating during the interval i as a rate^ denoted 9l I , where 

fffl G [0, 1]. We treat the observed value for the indica- 
te) 
tor variable j\ ( as one sample of an underlying stochastic 

process that can take a value of either or 1. A single 
sample of the data provides a poor estimate of the rate 

(r) 

Q\ I . However, by modeling the data as a Bernoulli pro- 
cess, we can estimate the variance in rate, based on our 
limited number of observations. This approach enables 
us to derive a decision model without overestimating our 
confidence in small samples of data. 

(r) 

We hypothesize that 01 / varies in a predictable manner 
according to a small set of state variables that capture 
the essential decision parameters in the experiment. To 
uncover these trends, we partition the data in a num- 



ber of ways in this and the following section. In this 
section, we combine data for all the participants to ob- 
tain aggregate rates for the population as a whole, and 
in the following section, we consider heterogeneity in the 
evacuation rates of individual participants. 

We begin by aggregating the data for specific disaster 
likelihoods Phit, which in the experiment can take on 
values V e {0.0, 0.1, 0.2, . . . , 0.9, 1.0}. For each possible 
value V, we determine the total number of intervals in the 
aggregate experiment where a participant who is AtHome 
observed Phit = v: 



A Data segregated by P^, 



^^ = Y.Y. E ^, 



(r) 



(2) 



I r J:Phit = i> 



We likewise determine the total number of times such 
participants then evacuated: 



^^ = EE E ^i 

I r i:Pi,it=,y 



(r) 



(3) 



We use the uppercase Qi, to indicate the evacuation 
rate for each value P^it = i^- If we think of J^ as a ran- 
dom variable (modeled as a sum of Bernoulli variables) 
given Oi/ and H^, then Ji, has a binomial distribution. 
Conversely, the likelihood of Qi, given H^, and J„, has a 
Beta(a,/3) distribution [70], with parameters a = J^ -\-\ 
and (3 = H^, — J^ + 1. We thus measure rates from the 
data using the expected value of this Beta distribution: 



e^ = E Beta( J^ + 1,H^~ J^ + l) = 



Ju + 1 



The standard deviation of these estimates is given by: 



(4) 



a(e,) = 



'(j^ + i)(g^- J, + 1) 

(i/, + 2)2(F„ + 3) 



(5) 



Given an abundance of data, the measured rate converges 
to the more intuitive fraction of evacuations J^/H^. 
However, when data is limited the approach described 
above yields a more accurate description of the evacua- 
tion behavior. 

Fig. IsIA. shows the estimated &i, rates (black dots) as- 
sociated with the 11 possible values i' of the disaster 
likelihood Phit- We observe that the rates increase ap- 
proximately monotonically with Phit in a manner that is 
reminiscent of a Hill function [7T]. We therefore model 
Qi, using the following functional form: 



pn 

M(Phit) = A- ^"* 



JD'll 



(6) 



which enables us to describe the decision making dynam- 
ics of the population using three parameters. First, A de- 
notes the maximum evacuation rate; when Phit is large, 
/I saturates to this value. A can therefore be used to 
estimate how quickly participants are able to react to 
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FIG. 8. Model rate laws and their variation with shelter ca- 
pacity and time pressure. In A we plot the measured rates for 
data partitioned only by Phit (black dots with grey bars for 
standard deviation), along with the best fit model (Eq. Im. In 
B we plot the measured rates for the data further partitioned 
by shelter capacity a, along with the best fit model where the 
mean threshold fc is a linear function of s (fc = ms + h) . Line 
color indicates shelter capacity: s = 10 (red; top), s = 20 
(orange), s = 30 (green), s = 40 (blue), and s = 50 (black; 
bottom). Not all Phit values were observed in all s value sce- 
narios. As bed number decreases, the rate curve shifts left, 
giving an increase in evacuation rate at the same Phit- The 
model in B displayed systematic inaccuracies requiring par- 
titioning the data into three different time scenarios (r = 1 
before 30 seconds in 30 second or greater runs, r = 2 after 
30 seconds in those runs, and r = 3 for 60 second runs). In 
C we plot only the 50-bed curves for the three scenarios and 
note that the rates for r = 3 lie between r = 1 and 2. 



rapidly changing conditions. Second, the threshold pa- 
rameter k represents the half maximum value of Phit, 
i.e., ^{k) = A/2. Third, the Hill-parameter n dictates 
the steepness of ^ at k. For large values of n (e.g., 
n > 20), fi{Phit) is threshold-like, being approximately 
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for Phit < k, and approximately A for Phit > k. 
For smaller values of n the transition is more gradual. 
Threshold policies have been extensively studied in pre- 
vious work and are postulated to accurately characterize 
individual decision making behaviors in a variety of sce- 
narios [2111371 [721 Eg. 

All models used in the manuscript are fit to the data 
by evaluating the measured rates at each value v of the 
disaster likelihood to obtain fi^. We then vary A, k, and 
n to maximize the expression: 



E. [{Hu - J.) in(i - ^i,) + J. H^l,)] , 



(7) 



a fit directly to the H^ and J^, values, not the Q^ values. 
This expression is derived through maximum likelihood 
estimation [74j for Beta distributed measurements. The 
more common x^ minimization for curve fitting is sim- 
ilarly derived from maximum likelihood estimation for 
Gaussian distributed measurements [74] , and our formula 
serves the corresponding role. 

Fitting our model to the measured rates in Fig.[8|A., we 
obtain k = 0.72±0.03, A = 0.28±0.06, and n = 11.9±1.4. 
The standard deviations reported here were obtained via 
bootstrapping [75] where we constructed synthetic data 
sets by randomly selecting 47 runs with replacement from 
the original data, then aggregating the data and fitting 
the model to the synthetic data using the method de- 
scribed above. The best fit model is plotted in Fig. [SJA. 
(solid black line). For most values of Phit, we find that 
this model accurately captures the observed behavior. 
However, we also observe systematic variations between 
the model and the experimental data. One set of vari- 
ations appears to stem from shelter capacity while the 
other appears to stem from temporal urgency for the 
evacuation decision. 

To examine the role of shelter capacity s in decision 
making, we aggregate the data for each of the 11 disaster 
likelihoods Phit at each of the 5 values of shelter capacity 
s. We adapt our use of the subscript i/ to now indicate 
this finer-grained aggregation into 11x5 sets of data. The 
measured rates confirm our expectation that evacuation 
rates were high when shelter space was scarce and low 
when shelter space was abundant (see Fig. Isb). 

To model the role of shelter capacity in modulating the 
average form of the evacuation decision, we consider two 
families of Hill functions based on our previous fits: one 
family drawn from variations in A and a second family 
drawn from variations in k. To guide our choice between 
these two alternatives, we consider optimal decision mak- 
ing behavior. If shelter space is abundant and informa- 
tion is precise, the optimal evacuation decision rule will 
be a threshold-like function in which the value of the 
threshold is just below Phit = 1.0. This behavior ensures 
that the individual evacuates when there is near certainty 
that the disaster will hit the community. If instead there 
is limited shelter space and the costs of the two possible 



incorrect decisions are equal, the expected evacuation de- 
cision rule will also be a threshold-like function, but in 
this case the value of the threshold will be just above 
Phit — 0.5. This behavior ensures the best chance of get- 
ting a bed in the shelter the least cost associated with a 
wrong decision. 

Because the threshold value appears critical for opti- 
mal decision making behavior in scenarios of both abun- 
dant and scarce shelter space, we choose the family of 
Hill functions obtained from varying k. We find that the 
following linear model of k versus s: 



^J■iPhit,s) = A 



Pv 



hit 



^hit + i^s + by 



(8) 



fits the data well. In Fig. |8^, we show the set of curves 
extracted for the best fit to the model in (|8| alongside 
the raw empirical data. The best fit values for k — ms + b 
are to = 0.0024 and b = 0.28. 

To test the accuracy of this model and to identify sys- 
tematic differences between the best fit model and the 
data, we compared the predictions of this model to the 
data, and found a systematic trend whereby we over- 
estimate the number of evacuations occurring prior to 
30 seconds in VariableTime runs and underestimated the 
number of evacuations occurring after 30 seconds in those 
runs. The difference between actual and predicted evacu- 
ations was profound and the shift between overestimating 
to underestimating was abrupt, shifting at exactly the 30 
second mark in nearly every VariableTime run. These re- 
sults show that an individual's behavior is additionally 
influenced by temporal urgency. 

To quantify the effect of temporal urgency, we extend 
our model in the following way. As in the previous ver- 
sions of the model, we aggregate the data for each of the 
11 Phit values at each of the 5 values of shelter capacity s. 
However, in this case we additionally aggregate data for 
the following 3 separate cases with differing temporal ur- 
gency: prior to 30 seconds in VariableTime runs (r ~ 1), 
after 30 seconds in those runs (t = 2), and all data in 
CertainTime runs (r = 3). We again adapt our use of 
the subscript v to now indicate this even finer-grained 
aggregation into 11x5x3 = 165 sets of data. 

To determine if temporal urgency had a more signifi- 
cant effect on A or on the threshold parameters (m, and 
b), we fit the model equation in Eq. p^ independently to 
the 3 r cases. From these fits and the confidence intervals 
on the parameter estimates we were able to determine 
that the variation of A with temporal urgency was more 
significant than the variation of n, to., or b. We there- 
fore constrained variation with temporal urgency to A, 
adopting a six parameter model: 



/x(Phit,s,r) = Ar 



Pin 



hit 



P]St + {ms + by 



(9) 



which has three At- values. The best fit values are pre- 
sented in Table [m 
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Parameter 




Symbol 




Value 


STD 


Hill-coefficient 




n 




9.3 


±1.3 


Maximum rates: 












r = 1 




Ai 




0.07 


±0.02 


r = 2 




A2 




0.37 


±0.07 


r = 3 




A3 




0.13 


±0.04 


Threshold parameters: 


{k 


= ms + 


b) 






Offset 




6 




0.60 


±0.05 


Proportionality const. 




m 




2x10" 


''■^ ±1 X 10"^ 



TABLE II. Parameter Estimates for the the model in Eq. [9] 
with standard deviations obtained via bootstrapping |75) . 



Figure [8]C illustrates the measured rates and model 
curves for a characteristic subset of the data (runs with 
50 beds) for each of the three time windows (r = 1, 2, 3). 
For this partitioning of the data both the first 30 sec- 
onds of VariableTime runs {t — 1) and the full 60 sec- 
onds of CertainTime runs (r — 3) are described by sim- 
ilar low evacuation rates Ai = 0.07 evacuations/second 
and A3 = 0.13 evacuations/second, respectively. Both 
of these are significantly smaller than the corresponding 
rate A = 0.28 evacuations/second for original aggregated 
data (Figure |8]A.) as well as the rate A2 = 0.37 evacua- 
tions/second observed after 30 seconds in the Variable- 
Time runs (r = 2). The increase in rate during the un- 
certain window in the VariableTime runs refiects a high 
temporal urgency associated with a disaster that could 
strike at any moment. It also suggests participants will 
respond quickly to changing Phit values under these con- 
ditions. 

The relatively low values of Ai and A3 are likely due 
to the fact that in these cases the disaster strike is only 
possible in the last time increment of these partitions, a 
low temporal urgency. In each case, urgency increases to- 
wards the end of the interval, and this occurs to a greater 
degree for t = 3 (CertainTime) than for t — 1 (first 
time window in VariableTime). In CertainTime runs, 
the scenerio terminates at exactly 60 seconds, so in this 
case the last observed Phit value describes the likelihood 
of a strike at 60 seconds, whereas in the first 30 seconds 
of the VariableTime runs the value of Phit at the end of 
the interval reflects the probability of a Hit not necessar- 
ily in the next time increment, but rather at some time 
within the uncertain 30 second window. We expect this 
distinction underlies our observation that A3 > Ai. 



Simulations 

We test our decision model by using it to simulate 
evacuation behavior for the 47 scenarios in the behav- 
ioral experiment. The appropriateness of our model can 
then be quantified by the difference between simulated 
and observed behavior, with small differences indicating 
that our model could be used as a generative model in 
future numerical studies. 



In the experiment, each scenario is characterized by a 
shelter capacity s and time pressure t, as well as a pre- 
scribed sequence of disaster likelihood values Phit ■ Using 
our decision rule, we can compute the expected rate of 
evacuations at each instantaneous value of (s, r, Phit)- 
If we initialize every simulation with 50 individuals at 
home {Hq = 50), we can compute the expected number 
of people AtHome in each interval {H>^') using: 



H. 



('■) \ _ 

i+l 



1- A^ 



-"hit 



^hit + (rn-s + by 



H 



(r) 
i+l 



(10) 



In the paragraphs below, we comment briefly on several 
key results from our simulations (see Fig. [o]) . 

Decision model accurately describes experimental ob- 
servations. In each scenario the simulated behavior ac- 
curately describes the observed behavior. This result is 
striking because our model aggregates the data over all 
participants over all scenarios to a reduced set of six pa- 
rameters, with no time resolution aside from separation 
into the three bins associated with the different time pres- 
sure variables. In the majority of scenarios the simulated 
evacuation behavior is qualitatively, and in many cases 
quantitatively, matched to the observed behavior of ex- 
periment participants. 

We begin our description of Fig. [9] with the three runs 
where participants had the most success, 36, 44, and 45. 
As can be seen here and in Fig. IIJA. (far right) , all but a 
single individual made the correct evacuation decision in 
these runs. In run 36, the disaster had a very predictable 
trajectory, gradually increasing in Phit before eventually 
striking. In runs 44 and 45, the disaster had a poor 
likelihood of striking and Phit decayed fairly rapidly. In 
contrast, the most difficult run was number 42. The Phit 
trajectory in this run peaked at 0.9 before the chance of 
a disaster strike rapidly decayed and the run ended with 
a Miss. As can be seen here and in Fig. IIJA. (far left) 
every participant was left either InShelter or InTransit. 

We observed sub-optimal decision making. In general, 
the optimal decision to evacuate in a given scenario de- 
pends not only on the likelihood and volatility of the 
underlying disaster process, as well as on the loss ma- 
trix, but also on the shelter capacity and the decisions of 
other individuals. However, scenarios 1, 2, 3, 4, 37, and 
40 are unusually simple in that participants knew that 
these scenarios would each last exactly 60 seconds, and 
that there was adequate shelter capacity for all partic- 
ipants. These two simplifying factors ensured that the 
actions of other participants were irrelevant. In these 
scenarios, it would be optimal to wait until immediately 
before the potential disaster strike to evacuate. As Fig. [9] 
indicates, in scenarios 1, 3, and 4, participants did not 
follow the optimal strategy; rather a significant number 
of participants evacuated well before the end of the sce- 
nario. In fact, many participants evacuated after only 
approximately 30 seconds. This behavior proved costly 
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FIG. 9. A comparison between data and simulation for the 47 scenarios and the best fit six-parameter model defined in Eq. [9] 
At each second the Phit value (blue), the shelter capacity, and the time scenario determine the rate used in the simulation, and 
the expected number of evacuations is calculated. The model was fit to estimated rates (Eq. H|, not to the time series data 
shown here. This extends the ability of the model to predict untested scenarios. The reduction from 2820 rates in the data to 
a six-parameter model generated a model with surprising accuracy. The following runs had identical Phit trajectories: (1,35), 
(3,46), (8,25), (9,36), (12,26), (13,29), (14,44,45), (15,16,38), (19,43), (22,31,33), (34,37), (39,41), (40,47). 



for them in scenarios 3 and 4. Scenarios 2, 37, and 40 are 
less conclusive because the strike likelihood Phit in these 
scenarios never exceeded 0.5 (and the disaster did not 
hit), making it relatively easy to decide not to evacuate. 

Participant behavior adapts over time. By construc- 
tion, several scenarios contained identical Phit trajecto- 
ries but differed in other parameters. Among these "re- 
peated" disasters, we observe evidence of learning with 
regard to time pressure. In runs 1, 3, and 8 there were 
some unnecessarily early evacuations, but participants 
waited longer to evacuate in the corresponding runs oc- 
curring later in the experiment (runs 35, 46 and 25). 

This observed adaptation could be explained either by 
effects of time pressure or by effects of strike likelihood. 
To determine the dominant driver of the adaptation, we 
compared the evacuation rates in runs 1-8 with those in 
runs 37-40 to determine whether there was evidence for 
adaptation in decision making strategies. While these 
runs differed in strike likelihood, the measured rates ob- 



served in the two groups did not show a significant change 
at high Phit values. This suggests that although partici- 
pants seemed to adapt their strategies in relation to time 
pressure, they did not adjust their behavior in relation 
to strike likelihood. 

Unexpectedly extreme sensitivity to shelter capacity. 
In each of scenario runs 27 and 29, shelter beds were 
scarce (10 beds for 50 people) and more participants evac- 
uated early in the scenario than our model predicted. It 
is possible that either (1) our linear model of the vari- 
ation of the threshold k with shelter capacity s is inad- 
equate when shelter space is very scarce, (2) that time 
pressure affects player behavior before 30 seconds in Vari- 
ableTime runs with low shelter capacity, or (3) the par- 
ticipants were reacting to each of these scenarios also 
immediately following runs in which a large number of 
individuals evacuated after the shelter was full, leaving 
those individuals stuck InTransit (runs 26 and 28). The 
early evacuations in runs 27 and 29 could therefore be 
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a reaction to participants being caught InTransit in the 
previous run. We are unable to discriminate between 
these three possibihties with this data set; we leave this 
for future work. 



Individual Variation 

Our success in identifying a decision making model 
that captures the observed collective evacuation behav- 
ior in the experiment led us to test whether a similar 
method could differentiate between individual decision 
making strategies. In the previous analyses, we combined 
data for all of the participants, which enabled us to fit 
the model to several experimental variables. Because the 
evacuation data for individual participants is relatively 
sparse, here we focus exclusively on the influence of the 
disaster likelihood Phit in decision making and do not 
separately consider the effect of shelter capacity or time 
pressure. 

To extend the collective decision making model to in- 
dividuals we estimated the evacuation rates for each par- 
ticipant at each Phit value using Eq. |4j We show this 
data in Fig. 
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where individuals are ranked by score 
from highest scoring (top left) to lowest (bottom right). 
Individuals could have as few as 9 measured rates if they 
consistently evacuated before Phit > 0.9 (see truncated 



curves in Fig. 10 ) 



Comparing the raw data in Fig. 10 for individuals with 
the corresponding measured rates for the aggregate pop- 
ulation shown in Fig.[8]illustrates an interesting deviation 
in the measurements at high values of Phit- For the ag- 
gregate population there is a significant and somewhat 
counterintuitive drop in measured rate from Phit = 0.8 
to 0.9; the value of the measured rate represented by the 
data points at Phit = 0.9 lies below the value represented 
at Phit = 0.8. However, while non-monotonicity is ob- 
served on the scale of individuals the trend is not system- 



atic (see Fig. 10). The difference between the population 
and individual fits suggests that the observed drop in the 
measured rate at high Phit in aggregate data is driven by 
heterogeneity in the population. Participants with high 
evacuation rates tend to leave before Phit > 0.9. Those 
who remain and observe high values of Phit typically dis- 
play low evacuation rates, thereby biasing the summary 
rates measured at the population scale. 

To capture individual decision making strategies, we 
fit a three-parameter Hill function (Eq. l6| to each indi- 
vidual's measured rates using Eq.[7] As shown in Fig.[TOJ 
the best fit models based on the Hill function capture the 
measured rate curves of each participant with striking ac- 
curacy. 

Higher evacuation rates accompany better perfor- 
mance. The wide range of participant decision making 
behavior is clearly visible in Fig. [TO] The variability is 
especially apparent when we compare the highest scor- 



ing individuals with the lowest scoring individuals. The 
highest scoring participants exhibit rates that increase 
sharply and monotonically, approximately beginning at 
^hit = 0.7. The lowest scoring individuals rarely evac- 
uate; we observe flat evacuation rate curves, with mea- 
sured rates that are relatively much lower and less sys- 
tematic in their variations compared to high scoring in- 
dividuals. As is apparent from the accuracy of the fits, 
this distinction is well captured by our model. 

A fundamental goal of our experiment was to identify 
psychological and behavioral predictors of individual per- 
formance. First, we ask whether parameter values from 
the best fit models on individual participants could be re- 
lated to behavioral performance in the experiment. The 
best fit models yielded rates A G [0, 1], with values for 
every individual displayed in Fig. |11| Overall, we observe 
a significant positive correlation between the maximum 
evacuation rate A in the best fit models and cumulative 
score (Pearson r = 0.41, p — 0.0028; see Fig. 11). We 



speculate that the maximum evacuation rate could be 
related to a participant's fundamental reaction time. If 
true, our results suggest that participants who can react 
quickly to rapidly changing conditions in their environ- 
ment are more successful in the experiment. 

Investment risk seeking may give higher evacuation 
thresholds. To identify potential psychological predic- 
tors of individual performance, we tested whether risk 
scores in the financial domain and health & safety domain 
were related to individual differences in decision making 
strategies. We found a significant relationship between 
k and risk score in the investment domain (r = 0.30, 
p = 0.03), indicating that individuals with higher deci- 
sion thresholds tend to have more risk seeking attitudes. 
We interpret this result with caution due to the possi- 
bility of Type II errors in the large number of tests per- 
formed (3 risk scores and 3 best fit model parameters = 
9 tested correlations). However, a correlation between 
these two variables is plausible; it suggests that partici- 
pants who tolerate more financial risk are more likely to 
wait until the disaster is imminent before evacuating. 

Similar decision models can produce different scores. 
It is noteworthy that some low and intermediate scoring 
participants display reduced (binned) decision statistics, 
and consequently decision model parameters, that are 
almost identical to those of the highest scoring partic- 
ipants. For example, participants 1 and 36 have very 
similar decision models but very different scores (2590 
and 2270). This result indicates that in some cases simi- 
lar decision making strategies can produce very different 
performance outcomes. 

Our decision model reduces the data to a single sce- 
nario parameter (Phit) and therefore fails to capture the 
other features that are likely to be important in distin- 
guishing between individuals such as timing of the deci- 
sion. Our data on the population scale suggested that 
time pressure and shelter capacity are important vari- 
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FIG. 10. A comparison between the decision making model and data from the behavioral experiment for each participant, 
ranked according to cumulative score. Evacuation rates for each individual at each Phit value were measured using Eq.ll] These 
values are plotted in blue accompanied by the estimated standard deviations for each point (grey bars) calculated based on 
Eq. Is] Hill functions were fit for each individual using the routine described in Eq. u\ (dotted black) . Higher evacuation rates 
tend to result in higher scores. The fits give a significant correlation between evacuation rate A and score (Pearson r = 0.41, 
p = 0.0028). Moreover, individuals who evidencing higher financial risk attitude scores (i.e., more risk seeking) have higher 
thresholds for evacuation fc than individuals evidencing lower financial risk attitude scores (Pearson r = 0.30, p = 0.03). 



ables and likely have similar importance on the scale of 
individuals. By comparing the detailed time evolution of 
individual runs, we observe instances where higher scor- 
ing participants tended to wait longer before evacuating 
than lower scoring participants, a more successful strat- 
egy- 
While we are unable to quantify with significance these 
effects in the current experiment due to limited data, our 
model provides a tool for estimating the quantity of data 
needed to robustly quantify these parameters in driving 
individual decision dynamics. 



DISCUSSION 

The behavioral network science experiment reported 
in this paper quantifies several key factors influencing in- 
dividual evacuation decision making in a controlled lab- 
oratory setting. The experiment includes tensions be- 
tween broadcast and peer-to-peer information, and con- 
trasts the effects of temporal urgency associated with the 
imminence of the disaster and the effects of limited shel- 
ter capacity for evacuees. In this section we summarize 
our key findings, discuss several methodological consid- 
erations, and describe implications for future work. 



Predictive, scalable model of collective and 
individual human decision making 

Based on empirical measurements of the cumulative 
rate of evacuations as a function of the instantaneous 
disaster likelihood, we developed a quantitative model 
for decision making that captures remarkably well the 
main features of observed collective behavior across the 
47 disaster scenarios. Moreover, we are able to capture 
the sensitivity of individual and population level deci- 
sion behaviors to external pressure on resources (limited 
shelter capacity) and time (imminence of disaster). Sys- 
tematic deviations from the model provide meaningful 
estimates of variability in the collective response. Our 
analysis uncovers a temporal evolution in individual be- 
havior over the course of the experiment, indicative of 
increasing attention and swiftness of response, and con- 
sistent with the expectation that individuals learn from 
previous incidents. 

Data from the experiment reveal significant hetero- 
geneity in individual decision making patterns captured 
by significant variation in model parameter fits to partic- 
ipants. The results distinguish between high scoring in- 
dividuals whose decisions to evacuate are strongly linked 
to a tight range of disaster likelihoods, versus others who 
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FIG. 11. Best fit models provided values for the maximum 
evacuation rate A for each individual. The distribution of A 
values across participants spanned almost the full range from 
to 1. Here we observe a significant correlation between 
A values and cumulative score across participants (Pearson 
r = 0.41, p — 0.0028). This result provides statistical support 
for the apparent tendency for high scoring individuals to also 
display higher rate values (see Fig. 10 1. 



Methodological considerations 



exhibit significantly more variable decision making pat- 
terns and did not score as well in the experiment. Both 
the individuals' overall success rate in the experiment 
and the decision making variables that model their be- 
havior are correlated with heterogeneities in individual 
risk attitudes, as measured by established psychological 
tests. 

These results suggest new directions for numerical 
modeling. For example, simulation studies that extrapo- 
late decision making strategies identified in small groups 
to larger collectives could more accurately predict behav- 
ior in large scale populations and coalitions. Addition- 
ally, simple mathematical models are needed to better 
understand the tensions and tradeoffs identified in this 
experiment. Effects of competing broadcast and social 
information in collective decision dynamics have been in- 
vestigated previously in a numerical simulation, where 
individuals were represented by nodes in a network, and 
obtained information from a broadcast source as well as 
neighboring sites in the network |37| . In that case, deci- 
sion making was modeled as a threshold on an individual 
state variable representing opinion, and the opinion of 
each individual was updated based on a stochastic con- 
tact rule with the broadcast source (essentially a warning 
that the disaster was coming) and other individuals (who 
might or might not have received any information about 
the disaster) . The results presented in this paper suggest 
important extensions to that model that (1) incorporate 
different types of information from broadcast and social 
sources, including an underlying physical process involv- 
ing likelihood and urgency and (2) directly implement the 
individual decision model developed in this study rather 
than assuming the more simplistic update rule employed 



While no laboratory experiment can fully capture the 
tensions associated with a true disaster, known factors 
influencing human risk perception and urgency were ac- 
counted for wherever possible in the experimental design. 
These include both linguistic and visual elements, which 
are well studied in the psychology and risk literature. 
Examples include the use and representation of disas- 
ter likelihood rather than probability, as well as scores 
for each scenario represented in terms of a potential loss 
rather than a payoff for a scenario. Previous studies have 
shown that humans respond differently to losses than 
gains [SSI \E7\ J and are significantly more accurate in deci- 
sion making based on data presented as likelihoods than 
on data presented as probabilities [58l [59] . 

The changing likelihood presented to the participants 
in this study represents the uncertain, and highly vari- 
able physical processes that govern the real time ap- 
proach of natural disasters, such as wildfires or hurricanes 
[m [521 [SI F76ti78] . and that ultimately result in either 
a "Hit" or a "Miss" for individual homeowners or com- 
munities. The existence of an underlying, quantifiable 
process for the disaster introduces objective parameters 
that govern volatility, difficulty, and uncertainty that can 
be varied in the experiment. Higher volatility, as well as 
variable time steps, leads to an outcome that is more dif- 
ficult to predict. Based on the rules of the process, it is 
possible to calculate the likelihood of the disaster at each 
time increment (which is the only aspect of the process 
presented to the participants in this experiment, and it 
is presented at limited resolution) , as well as the optimal 
evacuation decision (in the absence of shelter capacity 
limitations) W|. 

The details of this process were deliberately hidden 
from the participants, who were only presented with the 
current estimated likelihood of the disaster hitting their 
community, updated at one second intervals. Our deci- 
sion to obscure most of the details from the participants 
was based on observations of realistic disaster event sce- 
narios where the public has access to limited information 
about the disaster likelihood. The complexities of geo- 
physical events are commonly reduced to highly simpli- 
fied trajectories and "likelihoods" when presented to the 
public whether it be the chances of rain, or the chances 
of a disaster [75] . 

In any behavioral experiment, it is of interest to com- 
pare participants' actual behavior to optimal behavior 
from a profit-maximization perspective. In our exper- 
iment, the optimal evacuation time depends both on 
the volatility of the disaster process and on the poten- 
tially confounding actions of other participants. While 
the choice of an underlying stochastic process in prin- 
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ciple allows for the calculation of a limiting theoretical 
optimal decision strategy [68], our results demonstrate 
that human behavior departs from optimality at a more 
primitive level. As previously discussed, even in the sim- 
plest cases where an optimal strategy is easily obtained 
(i.e., where there is no competition for shelter space, and 
the time of the possible disaster strike is known in ad- 
vance), the participants still act sub-optimally. This re- 
sult highlights the critical importance of uncovering pre- 
dictive models of the suboptimal decision strategies that 
humans employ in real and laboratory settings. 



A framework for quantitative analysis and 
prediction of human behavior in disasters 

In the development and assessment of policy for dis- 
aster mitigation and response, human behavioral factors 
are often the least well quantified, understood, and mod- 
eled. Plans for evacuation based on broadcast communi- 
cation and transportation alone can be rendered ineffec- 
tive if humans do not act as expected. In retrospective 
analysis of data from recent events [551 HSl IS^HM] , pre- 
diction and planning for human social factors have been 
identified as the critical missing link in developing ef- 
fective strategies to insure safety of the population as a 
whole. As a result, critical resources are diverted to in- 
dividual crisis hot spots that might have been avoided 
with a more effective plan, and in many cases lives are 
ultimately lost. 

These shortcomings motivate our investigations, which 
represent the initial steps in development of a compre- 
hensive, predictive framework that incorporates human 
factors in policy and planning for disaster mitigation and 
response. Success in this area mandates an iterative 
approach that combines numerical modeling with con- 
trolled experiments and retrospective analysis of data col- 
lected from actual disasters. Our study uncovered mul- 
tiple drivers of individual decision making behavior from 
competing information sources. The social network as 
a whole provided a source of information on shelter oc- 
cupancy, inducing a sense of urgency in the population, 
while the topology of the network surrounding a given in- 
dividual (i.e., the number of that individual's neighbors) 
swayed the time spent engaging the social network. De- 
spite these influences, individual participants spent the 
majority of their time consuming the broadcast informa- 
tion, and the disaster likelihood was the primary factor 
influencing decision making strategies in the population 
as a whole. 

The observed tensions between the two sources of in- 
formation are consistent with empirical observations of 
human behavior in real disasters. Outside of the labora- 
tory setting, the likelihood of a disaster event is clearly 
a dominant factor in any decision to evacuate, and indi- 
viduals spend a great deal of time gathering information 



from television and other media broadcast sources, even 
if updates are slow. However, social media and peer- 
to-peer communication networks are playing an increas- 
ingly important role in transmission of early warnings 
by on-site observers who may communicate observations 
informally via Twitter and Facebook [50] (e.g., news of 
a 2011 earthquake in the Washington D.C. area propa- 
gated faster on social networks than the seismic waves 
themselves [HIl IHlj)- Furthermore, in some cases, such 
as developing countries, widespread access to broadcast 
networks may not be readily available, necessitating that 
policy makers rely on social means to communicate in- 
formation updates. Future experiments will change how 
participants access information in order to investigate 
these situations, and elucidate the corresponding effects 
on behavior. 

Additionally, in many (if not most) cases social fac- 
tors underlie the decisions of individuals who evacuate 
early or fail to evacuate even when the disaster is upon 
them [3?1 I3S1 [SI] . For example, families with small chil- 
dren tend to leave early, while caring for the elderly or 
reluctance to leave pets behind are often cited as reasons 
for not evacuating. These factors could be incorporated 
in future experiments using an explicit payoff structure 
that rewards collective decisions of neighbors in the social 
network. Another observed source of variation in evac- 
uations during disasters can be traced to heterogeneities 
in age, health, isolation, and socioeconomic status within 
the population. These factors influence speed and access 
to transportation, as well as potential losses associated 
with assets at risk. Such sources of variation may be 
incorporated in our framework by introducing explicit 
heterogeneity in the loss matrix and in the scenarios ac- 
cessible to a participant during the InTransit phase. 

Finally, our work highlights the role that individuality 
plays in the decisions of participants and their effect on 
collective behavior. The distribution of risk tendencies 
in this experiment might be related to the demographics 
of the cohort studied here (UCSB undergraduates), and 
future studies utilizing different participant groups could 
be used to probe such a relationship. For example, it is 
reasonable to expect that older and wealthier individu- 
als (e.g., homeowners) might be more risk averse in this 
domain than undergraduate students. Furthermore, par- 
ticipants who are explicitly trained in risk management 
and/or operate within different organizational structures 
(e.g., military officers) might employ different decision 
making strategies, and a group of such participants might 
by extension display a quantitatively different collective 
behavior profile. 

Our combined use of a novel experimental paradigm 
and powerful theoretical modeling techniques to identify 
and quantitatively characterize individual differences in 
human decision making strategies in social groups could 
form a critical bridge to key work in the fields of social 
neuroscience IS3] and neuroeconomics [511[5S|, which seek 



17 



to describe neurophysiological correlates of social and 
economic considerations driving human decision making. 
Indeed, human neuroimaging studies highlight the role of 
specific brain regions in economic choices and variations 
in decision strategies [551 HZ]- Individual differences in 
these circuits could underlie behavioral decision pheno- 
types in healthy and diseased clinical populations [88, 89j. 
Uncovering neurophysiological predictors of decision dy- 
namics in social groups would have far-reaching implica- 
tions for disaster preparation and response, marketing, 
and homeland security. 



Development of strategies to mitigate or manage 
collective evacuation behavior 



The ultimate goal of our investigations is development 
and testing of robust strategies for training and control 
of evacuations that account for human behavior and net- 
work topologies. These objectives may be incorporated 
within our framework across both broadcast and social 
channels. Broadcast information may include specific 
timing for public release of information, including like- 
lihood updates and incentives as well as warnings and 
mandates for evacuation. In the peer-to-peer communi- 
cation network, strategies for robust control and poten- 
tial fragilities of collective behavior may be investigated 
through insertion of trained "leaders," who make opti- 
mal decisions at different locations in the network, as 
well as through tracing the propagation of deliberately 
injected misinformation and poor decisions. Results ob- 
tained for these "designed" strategies may be compared 
to emergent leadership that might arise when the rank- 
ing and decisions of other individuals in the network is 
communicated through the social network, an inherent 
source of feedback which has been traced to the initia- 
tion of cascades in social decision making in a wide range 
of applications ^T\ . 
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